Search CORE

234 research outputs found

An XML-based Tool for Tracking English Inclusions in German Text

Author: Alex Beatrice
Grover Claire
Publication venue
Publication date: 01/01/2004
Field of study

The use of lexicons and corpora advances both linguistic research and performances of current natural language processing (NLP) systems. We present a tool that exploits such resources, specifically English and German lexical databases and the World Wide Web to recognise English inclusions in German newspaper articles. The output of the tool can assist lexical resource developers in monitoring changing patterns of English inclusion usage. The corpus used for the classification covers three different domains. We report the classification results and illustrate their value to linguistic and NLP research

CiteSeerX

Edinburgh Research Explorer

The Impact of Annotation on the Performance of Protein Tagging in Biomedical Text

Author: Alex Beatrice
Grover Claire
Nissim Malvina
Publication venue
Publication date: 01/01/2006
Field of study

In this paper we discuss five different corpora annotated for protein names. We present several within- and cross-dataset protein tagging experiments showing that different annotation schemes severely affect the portability of statistical protein taggers. By means of a detailed error analysis we identify crucial annotation issues that future annotation projects should take into careful consideration

CiteSeerX

Edinburgh Research Explorer

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Algorithms for Analysing the Temporal Structure of Discourse

Author: Grover Claire
Hitzeman Janet
Moens Marc
Publication venue
Publication date: 01/01/1995
Field of study

We describe a method for analysing the temporal structure of a discourse which takes into account the effects of tense, aspect, temporal adverbials and rhetorical structure and which minimises unnecessary ambiguity in the temporal structure. It is part of a discourse grammar implemented in Carpenter's ALE formalism. The method for building up the temporal structure of the discourse combines constraints and preferences: we use constraints to reduce the number of possible structures, exploiting the HPSG type hierarchy and unification for this purpose; and we apply preferences to choose between the remaining options using a temporal centering mechanism. We end by recommending that an underspecified representation of the structure using these techniques be used to avoid generating the temporal/rhetorical structure until higher-level information can be used to disambiguate.Comment: EACL '95, 8 pages, 1 eps picture, tar-ed, compressed, uuencoded, uses eaclap.sty, a4wide.sty, epsf.te

arXiv.org e-Print Archive

CiteSeerX

Edinburgh Research Explorer

Edinburgh Research Archive

Description of the LTG system used for MUC-7

Author: Grover Claire
Mikheev Andrei
Moens Marc
Publication venue
Publication date: 01/01/1998
Field of study

The basic building blocks in our muc system are reusable text handling tools which wehave been developing and using for a number of years at the Language Technology Group. They are modular tools with stream input/output; each tooldoesavery speci c job, but can be combined with other tools in a unix pipeline. Di erent combinations of the same tools can thus be used in a pipeline for completing di erent tasks. Our architecture imposes an additional constraint on the input/output streams: they should have a common syntactic format. For this common format we chose eXtensible Markup Language (xml). xml is an o cial, simpli ed version of Standard Generalised Markup Language (sgml), simpli ed to make processing easier [3]. Wewere involved in the developmentofthexml standard, building on our expertise in the design of our own Normalised sgml (nsl) and nsl tool lt nsl [10], and our xml tool lt xml [11]. A detailed comparison of this sgml-oriented architecture with more traditional data-base oriented architectures can be found in [9]. A tool in our architecture is thus a piece of software which uses an api for all its access to xml and sgml data and performs a particular task: exploiting markup which has previously been added by other tools, removing markup, or adding new markup to the stream(s) without destroying the previously adde

CiteSeerX

Edinburgh Research Explorer

A comparison of parsing technologies for the biomedical domain

Author: Grover Claire
Lapata Mirella
Lascarides Alex
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2005
Field of study

This paper reports on a number of experiments which are designed to investigate the extent to which current nlp resources are able to syntactically and semantically analyse biomedical text. We address two tasks: parsing a real corpus with a hand-built widecoverage grammar, producing both syntactic analyses and logical forms; and automatically computing the interpretation of compound nouns where the head is a nominalisation (e.g., hospital arrival means an arrival at hospital, while patient arrival means an arrival of a patient). For the former task we demonstrate that exible and yet constrained `preprocessing ' techniques are crucial to success: these enable us to use part-of-speech tags to overcome inadequate lexical coverage, and to `package up' complex technical expressions prior to parsing so that they are blocked from creating misleading amounts of syntactic complexity. We argue that the xml-processing paradigm is ideally suited for automatically preparing the corpus for parsing. For the latter task, we compute interpretations of the compounds by exploiting surface cues and meaning paraphrases, which in turn are extracted from the parsed corpus. This provides an empirical setting in which we can compare the utility of a comparatively deep parser vs. a shallow one, exploring the trade-o between resolving attachment ambiguities on the one hand and generating errors in the parses on the other. We demonstrate that a model of the meaning of compound nominalisations is achievable with the aid of current broad-coverage parsers

CiteSeerX

Crossref

Edinburgh Research Explorer

Optimising Selective Sampling for Bootstrapping Named Entity Recognition

Author: Alex Beatrice
Becker Markus
Grover Claire
Hachey Ben
Publication venue
Publication date: 01/01/2005
Field of study

Training a statistical named entity recognition system in a new domain requires costly manual annotation of large quantities of in-domain data. Active learning promises to reduce the annotation cost by selecting only highly informative data points. This paper is concerned with a real active learning experiment to bootstrap a named entity recognition system for a new domain of radio astronomical abstracts. We evaluate several committee-based metrics for quantifying the disagreement between classifiers built using multiple views, and demonstrate that the choice of metric can be optimised in simulation experiments with existing annotated data from different domains. A final evaluation shows that we gained substantial savings compared to a randomly sampled baseline. 1

CiteSeerX

Edinburgh Research Explorer

Macquarie University ResearchOnline

Recognising Textual Entailment Focusing on Non-Entailing Text and Hypothesis

Author: Grover Claire
Klein Ewan
Nahnsen Thade
Shen Rongzhou
Publication venue
Publication date: 01/01/2008
Field of study

Edinburgh Research Explorer

Labelling and Spatio-Temporal Grounding of News Events

Author: Alex Beatrice
Grover Claire
Publication venue
Publication date: 01/06/2010
Field of study

Edinburgh Research Explorer

Grounding Gene Mentions with Respect to Gene Database Identifiers

Author: Alex Beatrice
Grover Claire
Hachey Ben
Nguyen Huy
Nissim Malvina
Publication venue
Publication date: 01/01/2004
Field of study

We describe our submission for task 1B of the BioCreAtIvE competition which is concerned with grounding gene mentions with respect to databases of organism gene identifiers. Several approaches to gene identification, lookup, and disambiguation are presented. Results are presented with two possible baseline systems and a discussion of the source of precision and recall errors as well as an estimate of precision and recall for an organism-specific tagger bootstrapped from gene synonym lists and the task 1B training data. 1

CiteSeerX

Edinburgh Research Explorer

Up-cycling Data for Natural Language Generation

Author: Grover Claire
Isard Amy
Oberlander Jon
Publication venue
Publication date: 12/05/2018
Field of study

Edinburgh Research Explorer